Extensible Grouping and Aggregation for Data Reconciliation

نویسندگان

  • Eike Schallehn
  • Kai-Uwe Sattler
  • Gunter Saake
چکیده

New applications from the areas of analytical data processing and data integration require powerful features to condense and reconcile available data. Object-relational and other data management systems available today provide only limited concepts to deal with these requirements. The general concept of grouping and aggregation appears to be a fitting paradigm for a number of the mentioned issues, but in its common form of equality based groups and restricted aggregate functions a number of problems remain unsolved. Various extensions to this concept have been introduced over the last years, especially regarding user-defined functions for aggregation and derivation of grouping properties. We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications and illustrate the approach by introducing new concepts for similarity-based duplicate detection and elimination.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extensible and Similarity-based Grouping for Data Integration

The general concept of grouping and aggregation appears to be a fitting paradigm for various issues in data integration, but in its common form of equality-based grouping a number of problems remain unsolved. We propose a generic approach to user-defined grouping as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we ...

متن کامل

Extensible and Similarity-Based Grouping for Data Integratio

Data integration as required in a variety of applications like data warehousing, information system integration etc. makes great demands regarding features to deal with overlapping and inconsistent data. Object-relational and other data management systems available today provide only limited concepts to deal with these requirements. The general concept of grouping and aggregation appears to be ...

متن کامل

On-Line Nonlinear Dynamic Data Reconciliation Using Extended Kalman Filtering: Application to a Distillation Column and a CSTR

Extended Kalman Filtering (EKF) is a nonlinear dynamic data reconciliation (NDDR) method. One of its main advantages is its suitability for on-line applications. This paper presents an on-line NDDR method using EKF. It is implemented for two case studies, temperature measurements of a distillation column and concentration measurements of a CSTR. In each time step, random numbers with zero m...

متن کامل

Paper 928 Complex Group-By Queries for XML

The popularity of XML as a data exchange standard has led to the emergence of powerful XML query languages like XQuery [21] and studies on XML query optimization. Of late, there is considerable interest in analytical processing of XML data (e.g.,[2, 3]). As pointed out by Borkar and Carey in [3], even for data integration, there is a compelling need for performing various group-by style aggrega...

متن کامل

GGRA: a grouped gossip-based reputation aggregation algorithm

An important issue in P2P networks is the existence of malicious nodes that decreases the performance of such networks. Reputation system in which nodes are ranked based on their behavior, is one of the proposed solutions to detect and isolate malicious (low ranked) nodes. Gossip Trust is an interesting previously proposed algorithm for reputation aggregation in P2P networks based on t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001